Managing user identities in a managed multi-tenant service

ABSTRACT

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing data in a multi-tenant system. One of the methods includes receiving a data processing job associated with a user account of a user; determining to launch the data processing job on one or more cloud clusters of a cloud services provider; identifying a mirror account corresponding to the user, wherein the mirror account defines which cloud resources of the cloud services provider the user is permitted to access; obtaining a key for the mirror account; sending a request to launch the data processing job on the one or more cloud clusters, comprising sending data characterizing the data processing job, the mirror account of the user, and the obtained key to the one or more cloud clusters; and receiving output data associated with the data processing job from the one or more cloud clusters.

CROSS-REFERENCE TO RELATED APPLICATION

This application is a continuation of, and claims priority to, U.S.patent application Ser. No. 16/845,000, filed on Apr. 9, 2020, andclaims the benefit under 35 U.S.C. § 119(e) of the filing date of U.S.Patent Application No. 62/831,659, which was filed on Apr. 9, 2019.

The disclosure of the foregoing applications are incorporated here byreference.

BACKGROUND

This specification relates to user identities in data processing in amulti-tenant environment.

In a conventional multi-tenant on-premises only model, multiple users,e.g., of an enterprise, perform ad-hoc processing on clusters within theenterprise network, for example, at one or more data centers of theenterprise. Typically, the same user credentials are used for both useraccess to enterprise services and access to clusters.

In a conventional multi-tenant cloud only model, a third party cloudprovider hosts clusters that can be used to perform processing. Someenterprises using a cloud for data processing rely on a single serviceaccount to run all data processing jobs. Thus, regardless of theindividual user account initiating the job at the enterprise, the clouddata accesses are all associated with the same service account.

These third party cloud providers may also provide other services, forexample, e-mail, calendaring, and various software-as-a-serviceapplications.

SUMMARY

This specification describes technologies for managing identities in amulti-tenant environment where tasks are executed on behalf of a user ina non-interactive environment, e.g., in a hybrid on-premises and cloudarchitecture. The hybrid architecture includes on-premises clusters andcloud clusters for data processing. In addition, the cloud can provideother services, e.g., e-mail and other cloud-based applications. Whenthe user is present, an authentication token, e.g., a kerberos“delegation token,” is normally used. However, when un-attended tasksare launched on a cloud cluster on a users' behalf, and these tasksrequire access to a second system, it is important to avoid alsoproviding access to other systems such as the other cloud services.Using techniques described in this specification, user data processingjobs on the cloud clusters can be performed using a mirror accountgenerated for each user, where the mirror account mirrors an enterpriseaccount of the user. The mirror account can be transparent to the userwhile providing authorization, authentication, and auditing for clouddata processing jobs.

The subject matter described in this specification can be implemented inparticular embodiments so as to realize one or more of the followingadvantages. A mirror account can be associated with respective userenterprise accounts for use with data processing jobs performed on cloudclusters of a multi-service cloud environment. This can provideauthorization, authentication, and auditing of data accesses on thecloud clusters without risk of revealing user data for other servicesprovided by the cloud. The mirror accounts, in contrast with a singleservice account for all users, can each be tailored with specificcredentials for data access. Furthermore, the unique mirror accounts foreach user allow for simple auditing of data accesses that is moredifficult with a single super-user service account acting on behalf ofthe user. The mirror account creation and use can be transparent to theusers so that users do not need to learn another username and passwordto perform cloud based data processing tasks. With the use of a mirroraccount, a potential compromise of user credentials is limited to accessof only one service for a short period of time, and does not exposeaccess to other services.

The details of one or more embodiments of the subject matter of thisspecification are set forth in the accompanying drawings and thedescription below. Other features, aspects, and advantages of thesubject matter will become apparent from the description, the drawings,and the claims.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of an example hybrid on-premises and cloudarchitecture.

FIG. 2 is a diagram illustrating various user accounts for accessingdifferent services.

FIG. 3 is a flow diagram of an example process for credentialing a newuser.

FIG. 4 is a flow diagram of an example process for initiating a dataprocess job using a mirror account.

Like reference numbers and designations in the various drawings indicatelike elements.

DETAILED DESCRIPTION

FIG. 1 is a block diagram of an example hybrid on-premises and cloudarchitecture 100. In the hybrid on-premises and cloud architecture 100,some portion of the data processing of an enterprise happens onon-premises clusters 102 and 104 located in data centers 106 and 108,respectively, of an enterprise environment 101. Additionally, some otherportion of the data processing of the enterprise occurs on cloudclusters 112 located on a cloud environment 110.

The clusters 102, 104, and 112 can be computational clusters of manycomputing devices used to process large data sets in a distributedenvironment. In some implementations, one or more of the clusters areHadoop clusters.

Additionally, the enterprise environment 101 includes enterprise servers103 through which users can access the clusters of the data centers orcloud. Thus, the architecture 100 is also a multi-tenant architecturehaving multiple users who co-exist and who can access the same resourcesin the clusters at the same time.

The cloud environment 110 is provided by a third party distinct from theenterprise environment 101 and may provide data processing throughclusters 112 and other services 116 to many different enterprises. Theservices 116 can include cloud-based enterprise services, e.g., e-mail,calendaring, and conferencing. The enterprise services can also includesoftware-as-a-service applications, e.g., word processing, slidepresentations, and spreadsheets, as well as storage for documentsgenerated for each service. Users can assign processing jobs directed tobe run on one of the on-premises clusters 102 or 104 or to the cloudclusters 112.

In managing the data processing jobs, it can be valuable to maintainauthentication, authorization, and auditing for the processing jobsrunning on both the on-premises clusters and the cloud clusters.Authentication refers to the process of verifying the identity of a useror a process. It can be important to verify the identity of each userperforming processing jobs on both on-premises clusters and cloudclusters. Authorization refers to determining whether a user haspermission to perform an action, e.g., determining whether a user haspermission to access the data for a processing job on a cluster.Auditing refers to having a log or a trail of actions performed, e.g.,for determining which users performed which processing jobs on acluster, or determining which user accessed which specific pieces ofdata and when.

On-Premises User Identities

Within an enterprise environment, e.g., the enterprise environment 101,users are typically assigned a unique account used to identify the userand access particular network resources. For example, a user can beassigned an account for a Lightweight Directory Access Protocol (LDAP)that provides directory services allowing for the sharing of variousinformation through the network, e.g., an enterprise intranet.

In some cases, other user identities can be assigned for specific tasks.For example, users that perform data processing tasks using on-premisesclusters, e.g., the clusters 102 and 104, can be assigned a Unixaccount. There is often a one-to-one mapping between the LDAP accountsand the Unix accounts so that the user does not need to log intoseparate accounts to access the respective functions. In someimplementations, users that have finalized particular programming codecan move the job to run in a production environment. The processes forthe final production jobs can be scheduled to run as a Unix serviceaccount that is not associated with any particular individual useridentity.

Cloud Identities

An enterprise can also use cloud based services, e.g., the services 116,for example, that provide cloud based e-mail, calendaring, andapplications. A separate cloud services account can be assigned to eachuser for accessing cloud services. For example, an enterprise can usecloud-based e-mail where each user of the enterprise is assigned a cloudservices account. This cloud services account can also be given the sameaccount name as the user's LDAP account. However, the passwords aretypically different. The account authentication for the cloud servicescan be managed by the cloud services provider, e.g., a third-partyentity. The cloud services account can provide access to all user datastored as part of the cloud services.

The entity providing the cloud-based services can also provide computingclusters for data processing, e.g., the cloud clusters 112.

One option for accessing the data processing jobs executing for theenterprise on the cloud clusters 112 is to use a single service accountfor the entire enterprise. That is, a single service account can beprovisioned to run all data processing jobs of the enterprise on thecloud clusters 112. However, in such a case, auditing can be difficultbecause there is no direct audit trail of which users are running thedata processing jobs; rather, all job requests and data access requestsappear to be performed by the one service account. Additionally, sinceall data has to be accessible by the single service account,authorization can be difficult; that is, it can be difficult to providedifferent permissions for accessing cloud data to respective differentusers.

An alternative to using a single service account is to use theindividual cloud services accounts that are already assigned to theusers. This can provide authentication and auditing of data access inthe cloud.

However, when a data processing job runs on a cloud cluster, the joboften requires the cloud cluster to access data stored in the cloud. Ifauthentication is required to access the data, then the credentials foraccess, e.g., the credentials of the cloud services account of the userwho launched the data processing job, need to be made available to oneor more virtual machines of the cluster performing the data processingjob. If the credentials for the cloud services account of the user wholaunched the data processing job is used for authentication and madeavailable to the virtual machines, then the credentials for the cloudservices account might be vulnerable to being stolen, e.g., by anadversary who gains access to the virtual machines or by another user ofthe virtual machines. For example, any user with administrative accessto the virtual machine could obtain the credentials for the cloudservices account of a user running a job on the virtual machine, andassume the identity of the cloud service account and have access to allof the cloud services of that user, e.g., the services 116. For example,this could allow the administrative user to view e-mails or documents ofthe cloud services account.

Identities in a Hybrid On-Premises and Cloud Model

A mirror account can be generated for each user that has a one-to-onemapping with the user's enterprise account, e.g., the user's LDAPaccount for the enterprise. The mirror account can identify the userassociated with the corresponding enterprise account, and can be used toexecute jobs launched by the user on the cloud clusters, providingauthentication, authorization, and auditing in a multi-tenant hybridenvironment of an on-premises and cloud architecture. Additionally, byusing mirror accounts separate from cloud services accounts, thesecurity risk of administrative access to the user's cloud services datais reduced.

FIG. 2 is a diagram 200 illustrating various user accounts of a user 201for accessing computing resources of an enterprise of the user and foraccessing different services of a cloud services provider of theenterprise. The user 201 accesses cloud services 202 using a cloudservices account 203. The user 201 also accesses enterprise processingresources 204 of the enterprise, e.g., processing resources that includeone or more on-premises clusters, using an enterprise account 205, e.g.,a Unix or other suitable account. As noted above, the cloud servicesaccount 203 and the enterprise account 205 may have the same accountname; however, the account authentication for the cloud services account203 is managed by the cloud services provider, while the accountauthentication for the enterprise account 205 can be managed by theenterprise. A data processing request from the user 201 and directed tothe enterprise processing resources 204 can be run on an on-premisescluster using the enterprise account credentials 205 or can be run oncloud clusters 206.

The cloud services provider can generate a mirror account 210 for theuser 201 that provides access to particular cloud data for dataprocessing jobs performed by the cloud clusters 206. In someimplementations the enterprise processing resources 204 can generate themirror account 210. This mirror account can map to the correspondingenterprise account 205 of the user.

When the enterprise processing resources 204 receive a request to launcha data processing job of the user 201, the enterprise processingresources 204 can determine whether the data processing job should beexecuted on on-premises clusters, or on the cloud clusters 206. In someimplementations, the request for launching the data processing jobsubmitted by the user 201 will identify which type of cluster shouldexecute the job; in some other implementations, the enterpriseprocessing resource 204 can make the determination without user input,e.g., according to the availability of the on-premises clusters. If theenterprise processing resources 204 determine to execute the dataprocessing job using on-premises clusters, then the enterpriseprocessing resources can authorize and audit the job using theenterprise account 205. If the enterprise processing resources 204determine to execute the data processing job using the cloud clusters206, then the enterprise processing resources 204 can route the jobrequest to the cloud clusters 206 using the mirror account 210. Therequests sent by the enterprise processing resources 204 to the cloudclusters 206 can be authenticated using a password represented in FIG. 2by a key 207 of the mirror account 210, e.g., a JSON key.

The cloud clusters 205 can then authorize and audit the data processingjob using the mirror account 210. For example, the cloud clusters 206can generate logs of the requests for cloud data submitted by the dataprocessing job, and associate the logs with the user 201 associated withthe mirror account 210. The cloud clusters 206 can also determine, foreach request for cloud data submitted by the data processing job,whether the user 201 has access to the cloud data being requested. Inparticular, the cloud clusters 206, and any other users of the cloudclusters 206, e.g., an administrative user of the cloud clusters 206,cannot use the mirror account 210 to access the cloud services 202 ofthe user 201, ensuring that the user data of the cloud services 202 issecure even if credentials for the mirror account 210 is obtained byanother user of the cloud clusters 206.

The cloud services provider can generate the mirror account 210 in amanner that is transparent to the user 201. The user 201 does not needto know about the details of the mirror account 210 or the accountcredentials of the mirror account 210. Additionally, the passwords forthe mirror account 210, e.g., the key 201, can be generated andperiodically rotated by the system, e.g., by the enterprise processingresources 204 or the cloud clusters 206. As a result, the individualusers do not need to know the credentials of their respective mirroraccount or enter the credentials to perform data processing. The usersalso do not need to know that it is the mirror account that is theidentity accessing the data on the cloud when running their dataprocessing jobs. Data characterizing the mirror accounts of theenterprise, and the credentials for the mirror accounts, can be storedon-premises by the enterprise processing resources 204.

FIG. 3 is a flowchart of an example process 300 for generatingcredentials for a new user of an enterprise. For convenience, theprocess 300 will be described as being performed by a system of one ormore computers, located in one or more locations, and programmedappropriately in accordance with this specification.

The system generates an account identifier for the user (302). Forexample, the account identifier can be for a new enterprise account ofthe user. The enterprise account can be a new account generated when theuser joins the enterprise, for example, an LDAP account having a uniqueaccount name, e.g., newuser1. A corresponding cloud services account canbe generated at the same time for accessing cloud-hosted services of theenterprise, e.g., e-mail, provided by a cloud services provider. Thecloud services account can, for convenience, be generated with the sameaccount name, e.g., newuser1, but will generally have a differentpassword than the enterprise account.

The system associates the enterprise account with a group of users ofthe enterprise having access cloud data processing (304). For example,the system can associate the user's enterprise account with particularorganizational groups of the enterprise based on the user's role in theenterprise. These organization groups can each work with, and haveaccess to, different portions of enterprise data stored or processed incloud clusters of the cloud services provider.

The system can determine whether the enterprise account has a mirroraccount associate with it. In response to determining that theenterprise account does not have a corresponding mirror account, thesystem generates a mirror account for the user (306). The mirror accountcan be used by on-premises computing resources of the enterprise tolaunch computing jobs of the user onto the cloud clusters of the cloudservices provider. The mirror account can be constructed in a way thatreadily identifies the corresponding enterprise account. For example,the account name of the enterprise account can be included within themirror account, e.g., newuser1@enterprisegrp.iam.mirroraccount.com. Byincluding the enterprise account in the mirror account, auditing canquickly be performed in order to map requests for cloud cluster data toenterprise accounts associated with the data access requests.

The system establishes credentials for the mirror account of the user(308). The credentials can include a particular set of permissionsdefining the cloud data that the user account is permitted access. Thesystem can establish the credentials for the mirror account of the useraccording to the different enterprise groups with which the enterpriseaccount of the user is associated. Thus, the mirror account of each usercan be given credentials tailored to the particular groups of which theuser is a member, based on the types of data processing jobs thatmembers of the particular groups are allowed to perform.

The system can also specify a user storage area for storing dataassociated with the cloud processing jobs of the user separately fromdata of other users.

The system generates one or more keys for the mirror account. The keysare stored by the enterprise, e.g., in a secure key store of theenterprise processing resources, and retrieved when a user dataprocessing job is sent to the cloud clusters. The keys are stored suchthat they can only be retrieved in association with the particularmirror account.

Importantly, the mirror account does not have access to the cloudservices provided by the cloud services provider that are not related tothe cloud clusters and the launched data processing job. That is, themirror account cannot access the cloud services of the user that theuser can access using the generated cloud services account of the user.Thus, even if the mirror account is compromised, the user datamaintained by these other cloud services remains secure.

FIG. 4 is a flowchart of an example process 400 for initiating a dataprocessing job using a mirror account. For convenience, the process 400will be described as being performed by a system of one or morecomputers, located in one or more locations, and programmedappropriately in accordance with this specification.

The system receives a data processing job from a user of an enterprise(402). For example, the user can submit the job to a data processingmanagement system of the enterprise that manages data processing jobs,for example, by scheduling a portion of the possessing jobs to beperformed by on-premises clusters and by providing a different portionof the processing jobs to cloud clusters of a cloud services provider.In some implementations, the user request specifies whether the dataprocessing job should be executed on an on-premises cluster or on acloud cluster. In some other implementations, the data processingmanagement system determines where to send the data processing job basedon the resources needed to perform the job.

For a job to be performed by cloud clusters, the system identifies amirror account of the requesting user (404). The mirror account can beidentified based on the enterprise account name of the user. Inparticular, as described above, the mirror account can be generated forthe user that includes the enterprise account name in the name of themirror account. In some other implementations, the system looks up themirror account of the user in a directory that associates enterpriseaccount names (or Unix account names for on-premises data processing)with mirror accounts.

The system retrieves a current key for the mirror account (406). The keyis a password for the mirror account to provide authentication to thecloud clusters and can be stored by the system in a secure key store.Thus, the system can retrieve the key rather than request user passwordinput. The key for the mirror account can be periodically changed toenhance the security of the mirror account. In some implementations, aset of keys are rotated periodically, e.g., a particular number of days.That is, new keys can be periodically generated and the oldest phasedout.

The system sends the data processing job to the cloud clusters using themirror account identifier and key (408). The mirror account identifierprovides information allowing the cloud clusters to determine whetherthe account is authorized to access the data for the job. The keyprovides authentication of the mirror account.

If the user is properly authorized and authenticated, the cloud clustersperform the data processing job. The system receives output data fromthe cloud cluster (410). The output data can depend on the particulardata processing job performed by the cloud cluster.

Embodiments of the subject matter include methods and correspondingcomputer systems, apparatus, and computer programs recorded on one ormore computer storage devices, each configured to perform the actions ofthe methods. For a system of one or more computers to be configured toperform particular operations or actions means that the system hasinstalled on it software, firmware, hardware, or a combination of themthat in operation cause the system to perform the operations or actions.For one or more computer programs to be configured to perform particularoperations or actions means that the one or more programs includeinstructions that, when executed by data processing apparatus, cause theapparatus to perform the operations or actions.

Embodiments of the subject matter and the functional operationsdescribed in this specification can be implemented in digital electroniccircuitry, in tangibly-embodied computer software or firmware, incomputer hardware, including the structures disclosed in thisspecification and their structural equivalents, or in combinations ofone or more of them. Embodiments of the subject matter described in thisspecification can be implemented as one or more computer programs, i.e.,one or more modules of computer program instructions encoded on atangible non-transitory storage medium for execution by, or to controlthe operation of, data processing apparatus. The computer storage mediumcan be a machine-readable storage device, a machine-readable storagesubstrate, a random or serial access memory device, or a combination ofone or more of them. Alternatively or in addition, the programinstructions can be encoded on an artificially-generated propagatedsignal, e.g., a machine-generated electrical, optical, orelectromagnetic signal, that is generated to encode information fortransmission to suitable receiver apparatus for execution by a dataprocessing apparatus.

The term “data processing apparatus” refers to data processing hardwareand encompasses all kinds of apparatus, devices, and machines forprocessing data, including by way of example a programmable processor, acomputer, or multiple processors or computers. The apparatus can alsobe, or further include, special purpose logic circuitry, e.g., an FPGA(field programmable gate array) or an ASIC (application-specificintegrated circuit). The apparatus can optionally include, in additionto hardware, code that creates an execution environment for computerprograms, e.g., code that constitutes processor firmware, a protocolstack, a database management system, an operating system, or acombination of one or more of them.

A computer program, which may also be referred to or described as aprogram, software, a software application, an app, a module, a softwaremodule, a script, or code, can be written in any form of programminglanguage, including compiled or interpreted languages, or declarative orprocedural languages; and it can be deployed in any form, including as astand-alone program or as a module, component, subroutine, or other unitsuitable for use in a computing environment. A program may, but neednot, correspond to a file in a file system. A program can be stored in aportion of a file that holds other programs or data, e.g., one or morescripts stored in a markup language document, in a single file dedicatedto the program in question, or in multiple coordinated files, e.g.,files that store one or more modules, sub-programs, or portions of code.A computer program can be deployed to be executed on one computer or onmultiple computers that are located at one site or distributed acrossmultiple sites and interconnected by a data communication network.

The processes and logic flows described in this specification can beperformed by one or more programmable computers executing one or morecomputer programs to perform functions by operating on input data andgenerating output. The processes and logic flows can also be performedby special purpose logic circuitry, e.g., an FPGA or an ASIC, or by acombination of special purpose logic circuitry and one or moreprogrammed computers.

Computers suitable for the execution of a computer program can be basedon general or special purpose microprocessors or both, or any other kindof central processing unit. Generally, a central processing unit willreceive instructions and data from a read-only memory or a random accessmemory or both. The essential elements of a computer are a centralprocessing unit for performing or executing instructions and one or morememory devices for storing instructions and data. The central processingunit and the memory can be supplemented by, or incorporated in, specialpurpose logic circuitry. Generally, a computer will also include, or beoperatively coupled to receive data from or transfer data to, or both,one or more mass storage devices for storing data, e.g., magnetic,magneto-optical disks, or optical disks. However, a computer need nothave such devices. Moreover, a computer can be embedded in anotherdevice, e.g., a mobile telephone, a personal digital assistant (PDA), amobile audio or video player, a game console, a Global PositioningSystem (GPS) receiver, or a portable storage device, e.g., a universalserial bus (USB) flash drive, to name just a few.

Computer-readable media suitable for storing computer programinstructions and data include all forms of non-volatile memory, mediaand memory devices, including by way of example semiconductor memorydevices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks,e.g., internal hard disks or removable disks; magneto-optical disks; andCD-ROM and DVD-ROM disks.

To provide for interaction with a user, embodiments of the subjectmatter described in this specification can be implemented on a computerhaving a display device, e.g., a CRT (cathode ray tube) or LCD (liquidcrystal display) monitor, for displaying information to the user and akeyboard and a pointing device, e.g., a mouse or a trackball, by whichthe user can provide input to the computer. Other kinds of devices canbe used to provide for interaction with a user as well; for example,feedback provided to the user can be any form of sensory feedback, e.g.,visual feedback, auditory feedback, or tactile feedback; and input fromthe user can be received in any form, including acoustic, speech, ortactile input. In addition, a computer can interact with a user bysending documents to and receiving documents from a device that is usedby the user; for example, by sending web pages to a web browser on auser's device in response to requests received from the web browser.

Embodiments of the subject matter described in this specification can beimplemented in a computing system that includes a back-end component,e.g., as a data server, or that includes a middleware component, e.g.,an application server, or that includes a front-end component, e.g., aclient computer having a graphical user interface, a web browser, or anapp through which a user can interact with an implementation of thesubject matter described in this specification, or any combination ofone or more such back-end, middleware, or front-end components. Thecomponents of the system can be interconnected by any form or medium ofdigital data communication, e.g., a communication network. Examples ofcommunication networks include a local area network (LAN) and a widearea network (WAN), e.g., the Internet.

The computing system can include clients and servers. A client andserver are generally remote from each other and typically interactthrough a communication network. The relationship of client and serverarises by virtue of computer programs running on the respectivecomputers and having a client-server relationship to each other. In someembodiments, a server transmits data, e.g., an HTML page, to a userdevice, e.g., for purposes of displaying data to and receiving userinput from a user interacting with the device, which acts as a client.Data generated at the user device, e.g., a result of the userinteraction, can be received at the server from the device.

In addition to the embodiments described above, the followingembodiments are also innovative:

Embodiment 1 is a method comprising: receiving a data processing jobassociated with a user account of a user; determining to launch the dataprocessing job on one or more cloud clusters of a cloud servicesprovider; identifying a mirror account corresponding to the user,wherein the mirror account defines which cloud resources of the cloudservices provider the user is permitted to access; obtaining a key forthe mirror account; sending a request to launch the data processing jobon the one or more cloud clusters, comprising sending datacharacterizing the data processing job, the mirror account of the user,and the obtained key to the one or more cloud clusters, wherein theobtained key authenticates the request with the one or more cloudclusters; and receiving output data associated with the data processingjob from the one or more cloud clusters.

Embodiment 2 is the method of embodiment 1, wherein the one or morecloud clusters use the mirror account to authorize one or more requestssubmitted by the data processing job, the authorizing comprisingdetermining whether the user is permitted to access data associated withthe requests.

Embodiment 3 is the method of any one of embodiments 1 or 2, wherein theone or more cloud clusters use the mirror account to audit one or morerequests submitted by the data processing job, the auditing comprisinggenerating one or more logs associated with the requests and with theuser.

Embodiment 4 is the method of any one of embodiments 1-3, wherein thecloud services provider provides one or more other cloud services to theuser, and wherein the mirror account does not have access to dataassociated with the one or more other cloud services.

Embodiment 5 is the method of any one of embodiments 1-4, whereinobtaining the key for the mirror account comprises obtaining the keyfrom a secure key store, and wherein the key is updated periodically.

Embodiment 6 is the method of any one of embodiments 1-5, whereindetermining to launch the data processing job on the one or more cloudclusters comprises identifying a user input associated with the dataprocessing job, wherein the user input specifies that the dataprocessing job should be launched on the one or more cloud clusters.

Embodiment 7, is the method of any one of embodiments 1-6, whereindetermining to launch the data processing job on the one or more cloudclusters comprises determining an ability of one or more on-premisesclusters of an enterprise of the user to execute the data processingjob.

Embodiment 8 is the method of any one of embodiments 1-7, furthercomprising: receiving a second data processing job associated with asecond user account of a second user; determining to launch the dataprocessing job on one or more on-premises clusters of an enterprise ofthe user; and executing the data processing job on the one or moreon-premises clusters using credentials associated with the secondaccount.

Embodiment 9 is a method comprising: generating an account identifierfor a user; associating the account identifier with one or more groupsof users authorized to perform data processing on one or more cloudclusters; generating a corresponding mirror account for the accountidentifier, wherein the mirror account can be used to perform dataprocessing jobs for the user on the one or more cloud clusters; andestablishing credentials for the mirror account, comprising: definingaccess permissions to data stored on the one or more cloud clustersaccording to the one or more groups of users, and generating one or morepasswords for the mirror account.

Embodiment 10 is the method of embodiment 9, wherein the mirror accountand the one or more passwords for the mirror account are transparent tothe user.

Embodiment 11 is the method of any one of embodiments 9 or 10, whereinthe account identifier corresponds to an enterprise account of the userfor an enterprise, and wherein the enterprise account of the user can beused to perform data processing jobs for the user on one or moreon-premises clusters of the enterprise.

Embodiment 12 is the method of any one of embodiments 9-11, furthercomprising: generating a cloud services account for the user, whereinthe cloud services account corresponds to one or more cloud servicesprovided by a cloud services provider of the one or more cloud clusters;and establishing credentials for the cloud services account, wherein thecredentials for the mirror account and the credentials for the cloudservices account are different.

Embodiment 13 is the method of embodiment 12, wherein the accesspermissions corresponding to the mirror account do not provide access tothe one or more cloud services provided by the cloud services provider.

Embodiment 14 is the method of any one of embodiments 9-13, furthercomprising: storing the one or more passwords for the mirror account ina secure data store; and periodically updating the one or more passwordsfor the mirror account.

Embodiment 15 is a system comprising: one or more computers and one ormore storage devices storing instructions that are operable, whenexecuted by the one or more computers, to cause the one or morecomputers to perform the method of any one of embodiments 1 to 14.

Embodiment 16 is a computer storage medium encoded with a computerprogram, the program comprising instructions that are operable, whenexecuted by data processing apparatus, to cause the data processingapparatus to perform the method of any one of embodiments 1 to 14.

While this specification contains many specific implementation details,these should not be construed as limitations on the scope of anyinvention or on the scope of what may be claimed, but rather asdescriptions of features that may be specific to particular embodimentsof particular inventions. Certain features that are described in thisspecification in the context of separate embodiments can also beimplemented in combination in a single embodiment. Conversely, variousfeatures that are described in the context of a single embodiment canalso be implemented in multiple embodiments separately or in anysuitable subcombination. Moreover, although features may be describedabove as acting in certain combinations and even initially be claimed assuch, one or more features from a claimed combination can in some casesbe excised from the combination, and the claimed combination may bedirected to a subcombination or variation of a subcombination.

Similarly, while operations are depicted in the drawings in a particularorder, this should not be understood as requiring that such operationsbe performed in the particular order shown or in sequential order, orthat all illustrated operations be performed, to achieve desirableresults. In certain circumstances, multitasking and parallel processingmay be advantageous. Moreover, the separation of various system modulesand components in the embodiments described above should not beunderstood as requiring such separation in all embodiments, and itshould be understood that the described program components and systemscan generally be integrated together in a single software product orpackaged into multiple software products.

Particular embodiments of the subject matter have been described. Otherembodiments are within the scope of the following claims. For example,the actions recited in the claims can be performed in a different orderand still achieve desirable results. As one example, the processesdepicted in the accompanying figures do not necessarily require theparticular order shown, or sequential order, to achieve desirableresults. In some cases, multitasking and parallel processing may beadvantageous.

1. (canceled)
 2. A method comprising: generating data representing auser account for a user, wherein the user account is used toauthenticate requests for accessing resources of an enterpriseenvironment, and wherein a cloud services account of the user is used toauthenticate requests for accessing a first set of resources of a cloudservices provider; associating the user account of the user with one ormore permissions to perform data processing on one or more cloudclusters of the cloud services provider; generating a mirror accountcorresponding to the user account of the user, wherein: the mirroraccount is used to authenticate requests for accessing a second set ofresources of the cloud services provider, and the mirror account isdistinct from the user account and the cloud services account of theuser; and establishing credentials for the mirror account, comprising:defining access permissions to the second set of resources of the cloudservices provider according to the one or more permissions associatedwith the user account, and generating one or more credentials for themirror account.
 3. The method of claim 2, wherein associating the useraccount of the user with one or more permissions to perform dataprocessing on the one or more cloud clusters of the cloud servicesprovider comprises: associating the user account of the user with one ormore groups of users of the enterprise environment, wherein differentgroups of users have access to respective different portions of thesecond set of resources of the cloud services provider.
 4. The method ofclaim 2, wherein the access permissions corresponding to the mirroraccount do not provide access to the first set of resources of the cloudservices provider.
 5. The method of claim 2, further comprising: storingthe one or more credentials for the mirror account in a secure datastore; and periodically updating the one or more credentials for themirror account.
 6. The method of claim 2, wherein the one or more cloudclusters use the mirror account to authorize one or more requestssubmitted by respective data processing jobs launched on the cloudclusters and associated with the mirror account, the one or morerequests being associated with at least a subset of the second set ofresources of the cloud services provider, the authorizing comprisingdetermining whether the user is permitted to access the subset.
 7. Themethod of claim 2, wherein the one or more cloud clusters use the mirroraccount to audit one or more requests submitted by respective dataprocessing jobs launched on the cloud clusters and associated with themirror account, the auditing comprising generating one or more logsassociated with the requests and with the user.
 8. The method of claim2, further comprising: receiving a data processing job associated withthe user account of the user; sending a request to launch the dataprocessing job on at least one of the cloud clusters of the cloudservices provider, the request being associated with one or more of themirror account or the credentials for the mirror account; and receivingoutput data associated with the data processing job from the at leastone cloud cluster.
 9. A system comprising one or more computers and oneor more storage devices storing instructions that when executed by theone or more computers cause the one or more computers to performoperations comprising: generating data representing a user account for auser, wherein the user account is used to authenticate requests foraccessing resources of an enterprise environment, and wherein a cloudservices account of the user is used to authenticate requests foraccessing a first set of resources of a cloud services provider;associating the user account of the user with one or more permissions toperform data processing on one or more cloud clusters of the cloudservices provider; generating a mirror account corresponding to the useraccount of the user, wherein: the mirror account is used to authenticaterequests for accessing a second set of resources of the cloud servicesprovider, and the mirror account is distinct from the user account andthe cloud services account of the user; and establishing credentials forthe mirror account, comprising: defining access permissions to thesecond set of resources of the cloud services provider according to theone or more permissions associated with the user account, and generatingone or more credentials for the mirror account.
 10. The system of claim9, wherein associating the user account of the user with one or morepermissions to perform data processing on the one or more cloud clustersof the cloud services provider comprises: associating the user accountof the user with one or more groups of users of the enterpriseenvironment, wherein different groups of users have access to respectivedifferent portions of the second set of resources of the cloud servicesprovider.
 11. The system of claim 9, wherein the access permissionscorresponding to the mirror account do not provide access to the firstset of resources of the cloud services provider.
 12. The system of claim9, the operations further comprising: storing the one or morecredentials for the mirror account in a secure data store; andperiodically updating the one or more credentials for the mirroraccount.
 13. The system of claim 9, wherein the one or more cloudclusters use the mirror account to authorize one or more requestssubmitted by respective data processing jobs launched on the cloudclusters and associated with the mirror account, the one or morerequests being associated with at least a subset of the second set ofresources of the cloud services provider, the authorizing comprisingdetermining whether the user is permitted to access the subset.
 14. Thesystem of claim 9, wherein the one or more cloud clusters use the mirroraccount to audit one or more requests submitted by respective dataprocessing jobs launched on the cloud clusters and associated with themirror account, the auditing comprising generating one or more logsassociated with the requests and with the user.
 15. The system of claim9, the operations further comprising: receiving a data processing jobassociated with the user account of the user; sending a request tolaunch the data processing job on at least one of the cloud clusters ofthe cloud services provider, the request being associated with one ormore of the mirror account or the credentials for the mirror account;and receiving output data associated with the data processing job fromthe at least one cloud cluster.
 16. One or more non-transitory computerstorage media encoded with computer program instructions that whenexecuted by one or more computers cause the one or more computers toperform operations comprising: generating data representing a useraccount for a user, wherein the user account is used to authenticaterequests for accessing resources of an enterprise environment, andwherein a cloud services account of the user is used to authenticaterequests for accessing a first set of resources of a cloud servicesprovider; associating the user account of the user with one or morepermissions to perform data processing on one or more cloud clusters ofthe cloud services provider; generating a mirror account correspondingto the user account of the user, wherein: the mirror account is used toauthenticate requests for accessing a second set of resources of thecloud services provider, and the mirror account is distinct from theuser account and the cloud services account of the user; andestablishing credentials for the mirror account, comprising: definingaccess permissions to the second set of resources of the cloud servicesprovider according to the one or more permissions associated with theuser account, and generating one or more credentials for the mirroraccount.
 17. The non-transitory computer storage media of claim 16,wherein associating the user account of the user with one or morepermissions to perform data processing on the one or more cloud clustersof the cloud services provider comprises: associating the user accountof the user with one or more groups of users of the enterpriseenvironment, wherein different groups of users have access to respectivedifferent portions of the second set of resources of the cloud servicesprovider.
 18. The non-transitory computer storage media of claim 16,wherein the access permissions corresponding to the mirror account donot provide access to the first set of resources of the cloud servicesprovider.
 19. The non-transitory computer storage media of claim 16, theoperations further comprising: storing the one or more credentials forthe mirror account in a secure data store; and periodically updating theone or more credentials for the mirror account.
 20. The non-transitorycomputer storage media of claim 16, wherein the one or more cloudclusters use the mirror account to authorize one or more requestssubmitted by respective data processing jobs launched on the cloudclusters and associated with the mirror account, the one or morerequests being associated with at least a subset of the second set ofresources of the cloud services provider, the authorizing comprisingdetermining whether the user is permitted to access the subset.
 21. Thenon-transitory computer storage media of claim 16, wherein the one ormore cloud clusters use the mirror account to audit one or more requestssubmitted by respective data processing jobs launched on the cloudclusters and associated with the mirror account, the auditing comprisinggenerating one or more logs associated with the requests and with theuser.
 22. The non-transitory computer storage media of claim 16, theoperations further comprising: receiving a data processing jobassociated with the user account of the user; sending a request tolaunch the data processing job on at least one of the cloud clusters ofthe cloud services provider, the request being associated with one ormore of the mirror account or the credentials for the mirror account;and receiving output data associated with the data processing job fromthe at least one cloud cluster.